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I ntroductlon 

The ^evaluation of the culturally different 1 * In North America has typically 
been interpreted within the context of the evaluation of non-white, non-middle 
class minority groups, (e.g. Flauaher, 1970). Webster (1968), however, defines 
culture In a broader and psychometri ca I ly more relevant manner as "... a com- 
plex of typical behaviors or standardized social characteristics peculiar to a 
specific group, occupation or profession, sex, aqe, grade or social class ... tf 
(p. 552). It Is the basic premise of this paper that the probability that a 
test will be an inadequate (i.e. invalid) measure increases whenever a test is 
designed or developed by one qroup for use with another group, where there is 
(a) evidence that the groups differ with respect to variables related to test 
performance, and (b) an inability or unw 1 1 I i nqness for the qroups to communicate. 
Groups could differ on a number of cultural variables; this paper will concern 
Itself only with aqe. 

A great deal of attention has been paid, In recent years, to the problems 
which arise when middle-class whites design tests which are used with non-middle 
class, non-white qroups. Most of the qroups which feel that they have been or 
are beinq evaluated in an inadequate manner have spokesmen, a vocal constituency 
and legal recourse. There Is, however, a larqe qroup of frequently-tested indi- 
viduals who have had little to say about the manner or conditions of testing. 
This group is, of course, pre-school, primary and elementary age children. 
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The basic thesis of this paper Is that there Is a substantial body of evi- 
dence which, when Integrated, Indicates that children of this age bracket are 
being evaluated In a chronically Inadequate fashion. Not only are the vast ma- 
jority of tests developed with an adult T s perspective concerning adequacy of 
directions, Items, format, etc., but the evaluation of the products Is similarly 
carried out, within the context of adult experience, Imperfect memories, and at- 
tempts to Introspect oneself Into a chl Id's shoes (see, for example, the evalua- 
tions of Hoepfner, Stern and Nummedal (1971) and Hoepfner (1970)). A number of 
current measurement texts mention or attempt to delineate Important aspects of 
the problem. Stanley and Hopkins (1972), for example, devote a chapter to the 
psychological and cultural factors that Influence performance on measures of 
cognitive variables; Cronbach (1970) also devotes a chapter to the problems as- 
sociated with the measurement of ability in young children. 

A number of studies will be reviewed, includlnq those dealinq with adequacy 
of directions, test and item format, response mode, coaching and test-taklnq 
skills, serial retesting and anxiety, in an attempt to delineate several dimen- 
sions of the problem. 

The Adequacy of Test Directions Given Children 

The evaluation of the adequacy of test directions given children Is a com- 
plicated task, This stems from the fact that directions for most procedures com- 
mon to pre-school and primary grade settings are read to students. This, of 
course, rules out any attempt at determlninq readability using qeneral procedures 
such as the Dale-Chall formula (Dale and ChaM, 1948a; 1948b) or procedures spe- 
cifically developed for standardized tests (Forbes and Cottle, 1953), 

The work of Chomsky (1969), investlqatinq the acquisition of syntax In 
children, seems to bear on the problem, however, as does the work of 8ormuth, 
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Carr, Mannlnq and Pearson (1970), the work of BIMInqton (1972) and Tatum's 
(1970) study. AN showed a developmental sequence In children's ability to deal 
with certain definable linguistic structures. There does not seem to be any 
published work In the United States directly bearing on the relationship between 
the syntactic structure of test and Item directions and test performance. How- 
ever, In two studies conducted In Great Britain, Riley (1966) found 8 and 9 
year-olds with below-averaqe reading scores tended to score poorly on the ver- 
bal Essential Intel Mqence Test, and Cookson (1970) observed that children with 
reading age equivalents of less than 8.3 on the Schonel I Graded Reading Test 
were apparently unable to understand the Junior Eysenck Personality Inventory. 

A cursory examination of a few Immediately available tests revealed use of 
syntactic structures difficult for children, In both test Items and directions. 

The confusion between "ask" and "tell" observed by Chomsky might lead to 

a child's misunderstanding such an Item as "Do the children forget to ask you 

to piay with them?"' from the California Test of Personality, Primary Form B. 

Potential problems with pronoun reference, observed by Chomsky, were noted In 

such examples as this quotation from the directions to the usage subtest of the 

California Achievement Tests, Lower Primary, Form W: 

Each sentence below ha3 two words placed one above the other. You 
are to make an X on the one which you think is correct In each sentence. 2 

That the chi Id understand these directions Is important as he may otherwise as- 
sume the X should be placed on the Incorrect word. If he Is unable to cope with 
the Indefinite "one," the subtest is not valid for him. 

BMIlngton's findings on the relationships of subordinate to main clauses 

'louts P. Thorpe, Willis W. Clark and Ernest W. Tfeqs, California Test of 
Personality, Primary Form B. (California Test Bureau, 1942), p. 4. 

2 

Ernest W. Tiegs and Willis W. Clark, California Achievement Tests* Lower 
Primary, rFcrm W> (McGraw-Hill, 1957), p, 25. 

O 
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were applied to the Gilmore Oral Readlnq Test, Forms C and D. It was observed 
that paraqraph D-4 contained five simple sentences and three complex sentences 
consisting of one main clause with a left-branched temporal clause. The cor- 
responding paragraph of the parallel form C contained five simple sentences and 
two complex sentences one a main clause with a right-branched adjective clause, 
and the other, a main clause with a right-branched noun clause from which was 
right-branched a temporal clause. BIMIngton's findings sugqest, although they 
do not show conclusively, that right-branching may be easier for children to 
understand than lef t-branchlnq. Whether the three-clause complex sentence Is 
equal to two two-clause sentences Is an open question, as Is the parallelism 
of Forms C and D. 

The comparative-equal construction found to be particularly difficult for 
children by Bormuth, Carr, Manning and Pearson was frequently observed. Its 
use In directions Is common and seems especially Ill-advised. An example of Its 
use in directions Is In the Group Diagnostic Reading Aptitude and Achievement 
Tests: 

The teacher wl I! show you some designs on a card. Study these 
designs until the teacher removes the card. Then draw as many of 
them as you can remercber. 

This construction was frequently used In math subtests, but since It Is part 

of the vocabulary of mathematics, this seems reasonable. One word problem of 

the Stanford Achievement Test, Primary I Battery, Form W, however, states: 

Make crosses on as many pints as you can vlll If you empty the 
quart of milk Into them.-* 

^John V. Gilmore and E.C. GI Imore, Gilmore Oral Reading Teot s Forms C and 
D (Harcourt, Brace and World, 1968), p. 4. 

^Marfan Monroe and Eva Edith Sherman, Group Diagnostic Reading Aptitude and 
Achievement Test (C.H. Nelson Co., 1966), p. II. 

Kel ley L. Freeman, Richard Maddan, Eric F. Gardner and Herbert C. Rudman. 
Stanford Achievement Teet> Primary I Battery (Harcourt, Brace & World, 1964), 
Directions for Administering , p. 22. 



Although the comDarati ve-equa I concept may be necessary to the Item, to Include 
In Its wording, also, a conditional clause at the end of a three clause construc- 
tion seems unnecessarily complicated. Form X uses Identical wording on a paral- 
lel item. Form Y, however, contains no Item of comparable llngulsltc complex! ty 
In the corresponding subtest. 

In general, the questions asked on the tests examined appeared to be of the 
type termed "rote" by Bormuth, Carr, Manning and Pearson. However, In the third 
grade reading comprehension portion of the Iowa Tests of Basic Ski I Is, Multilevel 
Edition for grades 3-9, the following actor-deleted, passively-transformed ques- 
tion was observed, "Why was the sign put on the door?"** Examination of the re- 
maining "why" questions of the subtest revealed that, with one exception, all 
other "why" questions retained the active voice. The exception was In the 
seventh grade portion of the test. No reason for Including this difficultly 
worded question In the third grade portion of the test is apparent. 

Syntactic constructions not produced by all elementary school children were 

often observed In tests intended for them. Part F of the Social Adjustment Scale 

of the California Test of Personality, Primary Form B, was particularly loaded 

In this respect. Its Items included the following:^ 

Is there a nice group of children of your own age In your neigh- 
borhood with whom you play? 

Notice that this is a there- i nserti on sentence containing a transformation-pro- 
duced nominal used as the object of a preposition. 

Are some of the people near your home so mean that you like to do 
things to make them angry? 

Here are, not one, but two adverbial infinitives. 

^E.F. Llndquist and A.M. HIeronymus, Iowa Tests of Basic Skills, Form I> 
Multilevel Edition (State University of Iowa, 1955), p. 8. 

^California Test of Personality, Primary Form B, p. 14. 
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Are conditions In your neighborhood as good as you would like to have 

them? 

This sentence uses the comparative -equal construction as well as an advurblai 
Infinitive. Children's understanding of the subjunctive mood, which It also 
employs, has not yet been Investigated. Although others of the twelve parts of 
the California Test of Personality contain Items with difficult wordings, none 
seemed as linguistically difficult as the one In which these itefos appear. 

Measurement procedure' designed for the upper elementary grades do require 
that the child read and understand test and item directions. An analysis of 
seven commonly used batteries which were available for analysis (see Table I) 
was carried out using the Dale-Chsll formula with three 100 word samples, drawn 
from the beginning, middle and end of the total materia! read by the test-taker. 
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The results shown In Table I Indicate that the range of readability on In- 
termediate level measures Is from approximately the 4th to the 6th grade level, 
with a mean for tests of close to Sth grade level. When tests containing such 
directions 'ere given to 4th (or even many 5th) grade children, the reading dif- 
ficulty of the directions may prevent their comprehension of what Is desired or 
required. This Is very probably a problem common to most tests or batteries de- 
signed for a span of several grades at the elementary level. 

The Effect of Response Mode on Performance 

The Introduction of machine scorable answer sheets In educational measure- 
ment should have been welcome, preventing, among other things error rates of 
the magnitude documented by Phillips and Weathers (1958), who found that 28% of 
a sample of teacher-scored standardized tests contained errors. They should 
have been welcomed as an economy, allowing the use of relatively more expensive 
batteries printed in booklets which would be reusable over several years. In- 
stead, there was suspicion from the outset that primary chl Idren could not con- 
tend with the use of a separate answer sheet. 

Cashen and Ramseyer (1969) found differences between the booklet and ans- 
wer sheet response modes (favoring the former) to be significant at grades I and 
2, but not at grade three. Gaffney and Maquiro (1971), presented evidence that, 
even with a brief orientation to the use of z separate answer sheet and a short 
practice session children below grade four seem to have difficulty. They also 
found that with only orientation and no practice, children In grade five or be- 
low had difficulty with the response mode. Ramseyer and Cashen (1971) attempted 
to develop a training procedure to enable first and second graders to respond 
using a separate answer sheet. A 20 minute practice session involving an intro- 
duction to the use of separate answer sheets was provided experimental Ss. No 
significant gains due to training were reported, indicating that even after an 
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orientation procedure, first and second grade students were unable to contend 
with the demands of separate answer sheets. Solomon's (1971) study vlth Inner- 
city, culturally deprived fourth graders Indicated that there were no signifi- 
cant differences between test booklet and separate answer sheet formats. How- 
ever, there are two points of criticism in this study: six of a total of 45 
Ss were dropped from the separate answer sheet condition because they "... 
failed to follow Instructions." (p. 290), and the children are described as 
having "... had previous experience with machine scorable answer sheets. 14 His 
conclusions ought not to be taken seriously. Muller, Calhoun and Or ling (1972), 
In an experiment Involving third, fourth and sixth grade students, found the 
number of marking errors for a separate answer sheet group to be about three 
times that for a group responding in a test booklet at each of the three grade 
levels, In response to specially developed Items which were assumed to be com- 
mon knowledge to children at those levels. 

To summarize, It appears that the use of separate answer sheets is inad- 
visable with children below grade six. The abi I Ity to contend with the demands 
of this response mode may be teachable, and is probably a function of a number 
of "subject variables." This Is still a largely unanswered question. 

The Effects of Coaching, Practice and Test-Wiseness 

Surprisingly little data of a trustworthy nature has been generated on the 
question of test familiarization with students in the United States. One of 
the better, though somewhat dated, summaries of the British experience was re- 
ported L>y Vernon (1954), after a symposium dealing with the question. To 
briefly summarize their conclusions: 

a) It appears that non-verbal test material is more affected than verbal. 

b) It appears the typical gain !s from . 4 to I s.d., depending upon the 
g difficulty level of the test 

ERiC 
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c) The more naive students are, the greater will be the observed gains. 

d) The most efficient* and effective procedure seems to be a combination 
of coaching and actua 1 timed practice with the test or tests of con- 
cern . 

e) An Important possible effect which had (at that time) not been inves- 
tigated was the reduction of debilitating anxiety. 

Slakter, Koehler and Hampton (1970) reported a study which purported to trace 
the development of test-wlseness over the grades 5 to II. Although the results 
are Interpreted as Indicating a linear trend In the development of test-wlse- 
ness over the range of grades studied, several points should be stressed. 

a) The data appeared extremely unreliable for Ss below the ninth grade; 
an estimate of the rel lability of the TW measure for grades 5-8 would 
be .30. 

b) Although it Is claimed that grade effects were significant, an exam- 
ination of "i'he data Indicates either trivial differences between ad- 
jacent grades, or a deterioration In performance at higher grades 
(see especially their Figure I for grades 5-7 and 9-11). 

c * ^he same test-wise ness measures were used for grades 5-11. To claim 
.that a trend In the data indicates a developmental trend In test- 
wlseness behavior reveals start Ifnq faith concerning the range of ap- 
plicability of the measures used in the study. 
Mann, Taylor, Proger, Dungan and Tldey (1970) Investigated the effects of 
simple serial retesting with a sample of 7th graders, and found that signifi- 
cant gains did occur over the four trials given Ss with the greatest gains from 
first to second trial. No instructive feedback was provided students, nor was 
there any attempt at test-relevant or content- i ndepen dent coaching. Although it 
was demonstrated that test anxiety level was Independent of gains over trials, 
O there were no data concern! nq the mechanism producinq the observed qalns. 

ERIC 
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Diamond and Evans (1972), In an Investigation of the cognitive correlates 
of test-wlseness with clxth grade Ss, reported the median part-whole intercor- 
re I at I on for five scales designed to tap different facets of testwlseness was 
.58; the correlation with Lorge-Thorndl ke IQ (total) was .49, with Iowa Test 
of Basic Skills Vocabulary subtest (a good "rough and ready" IQ measure) .55. 
The authors conclude that testwlseness is "not a pervasive ski!! 11 and "these 
responses have little relatic; .hip to a student's general cognitive ability." 
(p. 150). I would say that their own data contradicts their conclusions, and, 
at the very least, any conclusions based on 6-item scales are probably unreli- 
able. 

Callenbach (1973) has reported what appears to be the only recent study 
which seems sensitive to the conclusions and recommendations of the Vernon sym- 
posium, made almost 20 years ago. Twenty-four relatively naive second grade Ss 
received eight 30-minute periods of deliberate Instruction and practice In con- 
tent-independent test-taking skills over a four week period. In a comparison 
with control Ss, It was found that the treatment resulted In a significant gai n 
In the experimental group of about .75 s.d., with a significant difference be- 
tween experimental and control groups 1 posttest scores. Unfortunately (fcr a 
clear-cut interpretation) an analysis of the gains made by control Ss indicated 
a significant gain In that group as well. One Is left wondering what the combi- 
nation of practice and instruction (as recommended by Vernon) would produce, In 
a study designed to estimate the effects of the factors In a planned, rather 
than post-hoc, fashion. 

One is drawn to the conclusion arrived at by Vernon almost 20 years ago — 
that a combination of Instruction and practice can have a statistically and prac- 
tically significant impact on naive Ss. When we are discussing elementary 
school children, probably the vast majority of students in the lower grades ^ 
ought to be regarded as test-naive. It would seem a deliberate testing-i nstruc- 
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ticnal effort, paralleling Boehm's Test of Baste Concepts , (Boehm, 1969) but In 
the area of test-taking skills and concepts, would be one approach to the re- 
moval of what may be a substantial source of error variance In young children's 
data. 

Some Additional Evidence That All Is Not Wei I With Standardized Tests 

Hoepfner and Doherty (1973) summed up and analyzed ratings which had been 
assigned the tests produced by seven major American test pub I Ishers. The MEAN 
evaluation system (Hoepfner, 1970; Hoepfner, et al., 1971), assessing measure- 
ment validity, examinee appropriateness, administrative usability and normed 
technical excellence, indicated the majority of tests were good with respect to 
administrative usability, but fair to poor In all other respects . In other 
words, It seems that a great number of tests are aval I able which make it very 
easy to generate meaningless data. 

!s this a fair indictment, you may well ask. To gain some Insight into 
the problem, a small study was carried out to determine the normative equi- 
valents of chance-level performance on an available set of standardized tests 
of cognitive variables, designed for elementary school children. No claim Is 
made for a random sample of measures, but the sample was drawn only on the ba- 
sis of amenability to analysis, so deficiencies noted may well be symptomatic 
of the entire population of Interest. Also, in discussing the notion of chance- 
level performance, we are actually speaking as though children responded In a 
random fashion to test Items. I am reasonably surs this is usually not the 
case. However, for reasons already discussed (e.g. inability to comprehend 
verbal or written instructions, inabi lity to uti Hze the response format of u 
test, naivete with respect to certain test-taking skills, debilitating anxiety), 
the testing experience may be so threatening or confusing to a child that the 
end result is the same; performance at or about the level that random response? 

O 

EI^JC would produce. 
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The question remains — what Is the incidence of the problem? A sample 
of 13 tests and batteries from files maintained by the Office of Measurement 
and Evaluation at the University of Pittsburgh was identified, and for each 
scale, subtest or test for which analysis was possible, an expected chance 
score and standard deviation for chance scores was calculated, usinq Gulliksen's 
formulae (1950, p. 263), which are, respectively: 

n(c-l) 

M = n/c and s = 

c c 

c 

where M Is the chance score 
c 

n is the number of test items 

c Is the number of options per item, and 

s is the standard deviation of chance scores, 
c 

Gulllksen further recommends that scores lower than M c +2s c be regarded as not 

indicating knowledge of the variables under consideration, i.e. that scores 

lower than M +2s not be taken seriously, 
c c 

Accordingly, Table 2 was constructed, in which normative equivalents (per- 
cent? le rank, grade equivalents or 10) are qiven for both M and M + 2s . 
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Table 2 

Chance Scores, Their Respective Standard Deviations, and Their 
Normative Equivalents on a Sample of Standardized Tests 
Deslpned for Pre-schooi, Primary and Elementary Pupils 



Test and Intended 
Grade Level (s) 


Chance 

Score 

(Mc)* 


Chance 
S.D. 

(s c ) 


* lie 
Rank 


Grade 
Equl v 


M 

IQ + C 


% 1 le 
Rank 


£ ra f IQ 
Equlv 


American School Achievement 
















Tests: Primary Battery II 
Grades 2 and 3 
















Sentence & Word Meaning 


7 

7.5 3 


2.4 




1 7 

1.8 


1 9 

T3~ 




2.6 


Paranranh Mpan I nn 


7.5 3 


9 4 
£ . *f 




1.7 
1.8 


12 
13 




2.3 
2.5 


nn in* L»ornpuTaT i on 


10 


2.7 




2.0 


15 




2.6 


Arith, Problems 


3 


1 .5 


— 


1.9 


6 


— 


2.9 


Language Usage 


6 


2.8 




1.4 


— 12 





1 .8 


Spel I i ng 


7.5 I 


2.4 




1 .6 
1 .7 


12 
13 


_ _ 


2. 1 
2.2 


American School Achievement 
















Tests: Intermediate Battery 
















Grades 4-6 
















Sentence & Word Meaning 


13 


2.7 


— 


2.3 


— 15 





2.9 


Paragraph Meaning 


10 


2.7 




2.6 


~ 15 




3.4 


Arith Computation 


10 


2.7 










4.8 


Arith Problems 


5 


1.9 




3.9 


— 9 




4.7 


Spel 1 ing 


12 5-12 






3.4 
3.5 


18 
T9" 




4.0 
4. 1 


Social Studies 


10 


2.7 




4.4 


— 15 




5.3 


Science 


10 


2.7 




4.5 


" 15 




5.4 



*When scores in this and the p ^ c 2s column are not whole numbers they are entered 
CD?r^ 8 ' r respective columns In the form Xa/Xb, representing the score above and be- 
tfMjL, the computed value. 
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Test and Intended 
Grade Level(s) 



Chance Chance < i le Grade 

Score SD Rank Equl v 

(Mc) (s ) 
c 



10 



M 
2s 



% I le Grade 
Rank Equi v 10 



California Test of Mental 
Maturity, 1957 S-Form 
Grades K-l 



bpatiai Ke I at i onsn I ps 


S R 5 

5.8 ^ 




40 
60 


10 
1 1 


99 
99 




Verbal Concepts 


6 


3.0 


2 


— 12 


70 




California Test of Mental 
Maturity, 1963 S-Form 
Level— u, v?raaes r\-Low i 














Factor 1 




2.3 


69 
79 


12 


97 
99 




Factor 1 1 


4 


1 .8 


33 


— 8 


98 




r* — _ ■ ill 

Factor 1 1 1 


6 


2.0 


24 


o 

-- o 


46 




Factor IV 


2 


1 .2 


31 


— 4 


69 


— 


Total Test 


19 
20 


7.3 


42 
46 


32 
35 


92 
93 


— 


California Test of Mental 
Maturity, 1957 S-Form 
Elemen. Gr. 4-8 














Factor 1 


|3 - 8 H 


3.9 


20 
20 


21 

2? 


50 
60" 






8 8- 

O.O g 




10 
10 


15 
T6~ 


40 
40" 




Factor IV 


12 5 12 


3-;o 


80 
80 


18 


95 
9T 




Cognitive Abilities Test, 
Primary I/Form l fr 
Ages 5-0 to 8-0 


16 


5.3 


2 


68 26 


16 


84 


Kuh Imann-Anderson Test, 
1964 Booklet A, 
Grade I-? 


9 5 -£ 
10 


3.5 


4 
4 


71 16 

72 17 


9 
9 


79 
80 
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Test and Intended 
Grade Levels 


Chan co 

Score 

(M ) 
c 


Chance 
SD • 

(s c> 


I le 

Rank 


Grade 
Equi v 


10 


> 

2s 

c 


t I le 
Rank 


Grade 
Equl v 


IQ 


Oti~-Lennon Mental Ability 
Test, Elementary 1 Level 
Gr. Mid. Gr. 1-3 


20 


3.8 


38 




95 


28 


69 




108 


Otis-Lennon Mental Ability 
Test, Primary 1 Level, 
Grade: Last half K. 


,3 - 8 t! 


3.2 


16 
19 


— 


84 
86 


19 

20 


38 
43 


— 


95 
97 


^RA PriTi^r\y Mpn+;=*l Ah 1 - 

1 ities Test, Gr. K- 1 
Verbal Meaning 


1 o 

12 5 — 
U * 3 13 


3.0 





— 


PA 
oU 

83" 


1 P 
1 O 





— 


Q X 

97 


Perceptual Speed 


7 


2.3 


— _ 





100 


1 1 


„ 


_ _ 


100 


Number Faci 1 1 ty 


0 


0.0 








0 








Spatial Relations 


3 


1 .5 






67 


6 






83 


Total Test 


22 
23 


6.8 


— 


— 


83 
87 


35 
36 


— — 


— 


93 
95 


SRA Primary Mental Abi- 
lities Test, Gr. 2-4 




















Verbal Mpanlnn 

▼ V 1 U U 1 | IV. VJ II 1 1 IV J 


15 


3,4 






68 


22 






76 


Spat i a 1 Re 1 at i ons 


6.3 | 
7 


2.2 






81 

85 


10 
1 1 






92 

95 


Number Faci 1 i ty 


. o 1 
1.2 y 


0.9 






74 
78 


3 
4 






81 
ST 


Perceptual Speed 


P 7 8 

8.3 g- 


2 . 6 






83 
84 


13 
14 






92 

9T 


Total Test 


30 
33 


9.7 






50 

HTT~ 
J 1 


48 
5 1 






66 


Stanford Achievement Test 
Primary 1 Gr. Md. i-Beg 2 




















Word Mean \ ng 


8.8 | 


2.6 


8 
12 


1 . 1 

1.2 




1 3 
14 


32 
40 


1 .4 
1.5 




Paragraph Meaning 


9 ' 5 TO 


2.7 


22 
36 


1 .4 

13 




14 

T5~ 


5G 
50 


1 .6 
1.6 
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Tft^t and 1 n+pnrlftd 

I W J 1 Ol 1 U 1 II 1 CI IUvU 


Phan ro 

\m/l I O 1 1 


Ol 1 Ol 1 


o i le 


Grade 


M 


^ i le 


Grade 


Grade Level(s) 


Score 
(M ) 


SD 
(s ) 


Rank 


Equi v 


10 + c 

2s 


Rank 


EqUlv 10 


jTaiiToru Men i ev©m©nT iesT 
















Primary 1, Gr. M«d. I-Beq 
2 
















Vocabulary 


13 


2.9 


12 


1.3 


— 19 


54 


1.7 


Word Study Ski 1 Is 


18.6 if 


3.5 


6 
1 1 


1 . 1 

1 .2 


25 
26 


20 
30 


1.3 
1.4 


An thrreti c 




3. 1 


1 

4 


1.0 
1 . 1 


15 

16 


12 

12 


1.2 
1 .2 


Stanford Achievement Test 
















Primary II Battery, Gr. 
















Word Meaning 


9 


2.6 


14 


1.8 


— 14 


42 


2.5 


Paragraph Meaning 


15 


3.4 


16 


1 .9 


— 22 


38 


2.4 


Science & Soc. Study 
Concepts 


1 9 

.2.6 {| 


2.9 


l ft 
18 


\ • D 

1 .8 


l ft 
T9 


j\j 
6? 


? 7 

779 


Word Study Ski 1 Is 


18.8 -[1 


5.1 


8 

TT 


1 . 5 
1 .6 


28 

" _ 29 


7/- 

36 
42" 


2 . 3 

2.4 


Language 


28.2 

29 


3.7 


22 
28 


2.2 
2. 3 


35 
36 


54 

60 


2.7 

I .8 


Arithmetic Concepts 


9 7 -2 

y -' 10 


3.8 


1 1 

16 


1.7 
1 .9 


17 
18 


42 
48 


2.6 
2.7 
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As is distress! nqly obvious, for several tests, a substantial proportion of 
the normlng sample (and, by Inference, children like them) performed in a man- 
ner which produced scores similar to what random responding would have produced* 

I feel that this Is a rather stronq Indictment of the type of data typi- 
cal ly collected with groups of elementary-age youngsters. I think it's past 
time that researchers In education and psychology stopped deluding themselves 
and others with rationalizations to account for the common low reliabilities and 
predictive validities for data collected from young children. I rrequlari ties 
in human development, differences in test content and constructs measured, and 
the discrepancy between Individual growth curves and a curve based on group 
averages are common "explanations" for the unstable data. These and other, 
factors may well exert an influence on the data. 

I suspect, however, that a fundamental reason is that, for large numbers 
of primary and elementary age chi Idren, the manner and type of testing done 
Is inappropriate. I. further suspect that the problem is one with at least a 
partial solution: 

a) Develop a measure or measures of "readiness" for standardized testing, 

b) Develop training experiences to prepare children for standardized 
testing and 

c) Eliminate the more inadequate tests from consideration In a testing 
program. 

This last consideration might Involve an examination of tests, evaluations 
in sources such as Buros' Mental Measurements Yearbooks, journals and other 
published sources, as well as personal evaluations based on, for example, the 
latest draft of the APA-AERA-NCME Standards for Development and Use of Educa- 
tional and P8y chological Test Systematic observation and evaluation of 
children's behavior on standardized tests miqht detect the operation of some 
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of the problems which lead to data like that presented In Table 2. This would 
seem a necessary first step in the development of either a) or b) above. 
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